{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "mnist_pytorch.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pKGF1ZLh8GXE"
      },
      "source": [
        "#Training a Neural Network in PyTorch#\n",
        "\n",
        "PyTorch is a powerful, widely-used machine learning library. It has all the latest neural network layers and functions and supports GPU computation.\n",
        "\n",
        "Most neural network research these days either happens in PyTorch or TensorFlow. Google Research develops TensorFlow while Facebook AI Research develops PyTorch. Functionally they are probably very similar, I just know a lot of people who use PyTorch and it seems like it might be somewhat easier for beginners to learn. It is important to have some experience in one of these two frameworks but if you understand the ideas and concepts you shouldn't have too much trouble moving between them.\n",
        "\n",
        "The `torch` library provides the basic functions we need when dealing with `tensors`. `tensors` are a generalization of matrices to arbirtrary numbers of dimensions (a matrix is a 2D tensor). Types of tensors:\n",
        "\n",
        "\n",
        "*   0 dimensional: Scalar\n",
        "*   1 dimensional: Array\n",
        "*   2 dimensional: Matrix\n",
        "*   3 dimensional: Number cubey thingy\n",
        "*   4 dimensional: ?????\n",
        "\n",
        "Anywho, let's start with our opening chant to invoke the power of PyTorch to aid us. The last line is a special call to the GPU gods to smile on our endeavor.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ehrybK-XPJrr"
      },
      "source": [
        "import torch\n",
        "import torchvision\n",
        "import torchvision.transforms as transforms\n",
        "\n",
        "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T7rPSRhhBoQc"
      },
      "source": [
        "##Time For Some Data##\n",
        "\n",
        "PyTorch has some built tools for downloading and loading common datasets. We'll be playing around with MNIST in this example. It is a dataset of 28x28 grayscale handwritten digits 0-9. There are 50,000 images in the training set and 10,000 in the test set.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iMUkulSBTNwo"
      },
      "source": [
        "def get_mnist_data():\n",
        "  trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True,\n",
        "                                        transform=transforms.ToTensor())\n",
        "  trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True,\n",
        "                                            num_workers=8)\n",
        "\n",
        "  testset = torchvision.datasets.MNIST(root='./data', train=False, download=True,\n",
        "                                      transform=transforms.ToTensor())\n",
        "  testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False,\n",
        "                                          num_workers=8)\n",
        "  classes = range(10)\n",
        "  return {'train': trainloader, 'test': testloader, 'classes': classes}\n",
        "\n",
        "data = get_mnist_data()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xI_PZ2Pus5cF"
      },
      "source": [
        "###Understanding Our Data ###\n",
        "It's worthwhile to check out how our dataloader loads the images into tensors. We can print out the size of the loaded data in the `images` tensors.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P1HO7vN-T5P4"
      },
      "source": [
        "# get some random training images\n",
        "dataiter = iter(data['train'])\n",
        "images, labels = dataiter.next()\n",
        "print(images.size())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C2h61JjW7rh1"
      },
      "source": [
        "Our `images` tensor is 4-dimensional, (32 x 1 x 28 x 28)\n",
        "\n",
        "PyTorch stores image data in (N x C x H x W) format. Thus the size of this tensor implies we have a mini-batch of 32 images, each have a single channel (grayscale), and each image is 28 x 28, so that makes sense!\n",
        "\n",
        "We can use matplotlib to see what our data looks like:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E42h6plq63Ar"
      },
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "\n",
        "def imshow(img):\n",
        "    npimg = img.numpy()\n",
        "    plt.imshow(np.transpose(npimg, (1, 2, 0)))\n",
        "    plt.show()\n",
        "\n",
        "# show images\n",
        "imshow(torchvision.utils.make_grid(images))\n",
        "# print labels\n",
        "print(' '.join('%9s' % data['classes'][labels[j]] for j in range(4)))\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0z2Zs4W74Rcv"
      },
      "source": [
        "##A Simple Network##\n",
        "\n",
        "First we'll build a very basic neural network with a single hidden layer of neurons. This means one fully connected layer of weights connects the input to the hidden neurons and one fully connected layer connects the hidden neurons to the output. We'll use the RELU activation function on the hidden neuron values as our nonlinearity.\n",
        "\n",
        "These fully connected (or `nn.Linear`) layers expect a 2D input tensor where that is N x I where N is the number of data points in a mini batch and I is the number of inputs. However, our data is formatted in (N x C x H x W) right now so we need to tell PyTorch to rearrange it using `torch.flatten`.\n",
        "\n",
        "After being `flatten`ed our data goes from (32 x 1 x 28 x 28) to being (32 x 784)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RfglkbNyVLQY"
      },
      "source": [
        "import torch.nn as nn\n",
        "import torch.nn.functional as F\n",
        "\n",
        "\n",
        "class SimpleNet(nn.Module):\n",
        "    def __init__(self, inputs=28*28, hidden=512, outputs=10):\n",
        "        super(SimpleNet, self).__init__()\n",
        "        self.fc1 = nn.Linear(inputs, hidden)\n",
        "        self.fc2 = nn.Linear(hidden, outputs)\n",
        "\n",
        "    def forward(self, x):\n",
        "        x = torch.flatten(x, 1)\n",
        "        x = self.fc1(x)\n",
        "        x = F.relu(x)\n",
        "        x = self.fc2(x)\n",
        "        return x"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cBv5uxRHrdH_"
      },
      "source": [
        "###The Training Function ###\n",
        "\n",
        "Now for training our network. Our `train` function takes as input the `net` to train and the `dataloader` for the training data. It also takes some optional parameters to control training.\n",
        "\n",
        "For our network we'll be using PyTorch's built in `nn.CrossEntropyLoss`. This will apply a softmax to our network's output, calculate the log-probability assigned to each class, then try to minimize the negative log likelihood of our data (AKA maximize the likelihood)\n",
        "\n",
        "For our optimizer we are using stochastic gradient descent with learning rate, momentum, and decay parameters."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XrzP3OzSWLnV"
      },
      "source": [
        "import torch.optim as optim\n",
        "\n",
        "def train(net, dataloader, epochs=1, lr=0.01, momentum=0.9, decay=0.0, verbose=1):\n",
        "  net.to(device)\n",
        "  losses = []\n",
        "  criterion = nn.CrossEntropyLoss()\n",
        "  optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum, weight_decay=decay)\n",
        "  for epoch in range(epochs):\n",
        "    sum_loss = 0.0\n",
        "    for i, batch in enumerate(dataloader, 0):\n",
        "        # get the inputs; data is a list of [inputs, labels]\n",
        "        inputs, labels = batch[0].to(device), batch[1].to(device)\n",
        "\n",
        "        # zero the parameter gradients\n",
        "        optimizer.zero_grad()\n",
        "\n",
        "        # forward + backward + optimize \n",
        "        outputs = net(inputs)\n",
        "        loss = criterion(outputs, labels)\n",
        "        loss.backward()\n",
        "        optimizer.step()\n",
        "\n",
        "        # print statistics\n",
        "        losses.append(loss.item())\n",
        "        sum_loss += loss.item()\n",
        "        if i % 100 == 99:    # print every 100 mini-batches\n",
        "            if verbose:\n",
        "              print('[%d, %5d] loss: %.3f' %\n",
        "                  (epoch + 1, i + 1, sum_loss / 100))\n",
        "            sum_loss = 0.0\n",
        "  return losses"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7PQs_ofgwmCt"
      },
      "source": [
        "###Training The Network###\n",
        "\n",
        "We'll instantiate a new network and train it on our training data.\n",
        "\n",
        "Our training function prints out some debug information about the epoch, batch number, and current loss values. It also returns a list of all the losses on our mini-batches so we can plot them all once training has finished."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZnniOt6IsJjo"
      },
      "source": [
        "net = SimpleNet()\n",
        "\n",
        "losses = train(net, data['train'])\n",
        "plt.plot(losses)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AbQR1GAlzqQt"
      },
      "source": [
        "##Testing The Network##\n",
        "\n",
        "We trained our network! The loss went down! That's good, right? But how good is our network, exactly?\n",
        "\n",
        "Well, we can try running our network on a few of our test images and see what happens:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jRnA3phakd4j"
      },
      "source": [
        "dataiter = iter(data['test'])\n",
        "images, labels = dataiter.next()\n",
        "images = images[:4]\n",
        "labels = labels[:4]\n",
        "\n",
        "# print images\n",
        "imshow(torchvision.utils.make_grid(images))\n",
        "print('GroundTruth: ', ' '.join('%5s' % data['classes'][labels[j]] for j in range(4)))\n",
        "outputs = net(images.to(device))\n",
        "_, predicted = torch.max(outputs, 1)\n",
        "\n",
        "print('Predicted: ', ' '.join('%5s' % data['classes'][predicted[j]]\n",
        "                              for j in range(4)))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UvR8TTs-z8Cc"
      },
      "source": [
        "Pretty good so far. But we also want to be able to test the network on all of our data. Here's a function that can do just that, computing the accuracy on a full set of data:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "7TNFVUA3mpJr"
      },
      "source": [
        "def accuracy(net, dataloader):\n",
        "  correct = 0\n",
        "  total = 0\n",
        "  with torch.no_grad():\n",
        "      for batch in dataloader:\n",
        "          images, labels = batch[0].to(device), batch[1].to(device)\n",
        "          outputs = net(images)\n",
        "          _, predicted = torch.max(outputs.data, 1)\n",
        "          total += labels.size(0)\n",
        "          correct += (predicted == labels).sum().item()\n",
        "  return correct/total"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_usQ2gGR01j0"
      },
      "source": [
        "Now we can give it a try:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MEl-HH83020a"
      },
      "source": [
        "print(\"Current accuracy: %f\" % accuracy(net, data['train']))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3gIXV-es1T7C"
      },
      "source": [
        "##Experiments##\n",
        "\n",
        "Now it's time to poke around a little bit with our models. First I just want this utility function to do window smoothing of data for us. As you may have noticed, we are doing *stochastic* gradient descent, so our losses for each mini-batch can vary quite dramatically. If we smooth them out a little bit they will be easier to look at when we plot them."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Zq0AprMN30cS"
      },
      "source": [
        "def smooth(x, size):\n",
        "  return np.convolve(x, np.ones(size)/size, mode='same')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TdHueTHS7ClZ"
      },
      "source": [
        "###Learning Rate###\n",
        "\n",
        "Let's experiment around with the learning rate of our model. Changing the learning rate should affect how fast our model converges and how accurate it is. We can see the effect when we plot out the loss function over time for models with different learning rates:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lC3gB5j61j2Z"
      },
      "source": [
        "net_high = SimpleNet()\n",
        "losses_high = train(net_high, data['train'], lr=.1, verbose=0)\n",
        "acc_high = accuracy(net_high, data['test'])\n",
        "plt.plot(smooth(losses_high,20), 'r-')\n",
        "\n",
        "\n",
        "net_mid = SimpleNet()\n",
        "losses_mid = train(net_mid, data['train'], lr=.01, verbose=0)\n",
        "acc_mid = accuracy(net_mid, data['test'])\n",
        "plt.plot(smooth(losses_mid,20), 'b-')\n",
        "\n",
        "\n",
        "net_low = SimpleNet()\n",
        "losses_low = train(net_low, data['train'], lr=.001, verbose=0)\n",
        "acc_low = accuracy(net_low, data['test'])\n",
        "plt.plot(smooth(losses_low,20), 'g-')\n",
        "\n",
        "print(acc_high, acc_mid, acc_low)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-aM_X31X7gxf"
      },
      "source": [
        "###Momentum###\n",
        "\n",
        "We are using the default value for momentum of `0.9`. Fix your value for the learning rate and try varying the values for momentum.\n",
        "\n",
        "####**Question 1: What affect does changing the value for momentum have on your model's convergence and final accuracy?**####\n",
        "\n",
        "TODO: Answer here"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "X5rsSpoP78KM"
      },
      "source": [
        "#TODO: Write your code here for experimenting with different values of momentum."
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "weTQgPBE8GC-"
      },
      "source": [
        "###Weight Decay###\n",
        "\n",
        "Right now we aren't using any weight decay with our model. However, it can be useful as a method of regularization if we are worried about overfitting.\n",
        "\n",
        "Take your best performing model parameters from above for learning rate and momentum. Fix these parameters as you answer the following questions:\n",
        "\n",
        "####**Question 2: Is our current model overfit or underfit to our training data? How can you tell?**####\n",
        "\n",
        "TODO: Answer here\n",
        "\n",
        "####**Question 3: Try out some different values for weight decay. What effect do they have on model convergence? What about final accuracy? Does this match with what you would have expected? Why or why not?**####\n",
        "\n",
        "TODO: Answer here"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Jywu7uy0aLa8"
      },
      "source": [
        "#TODO: Write your code here for experimenting with different weight decay"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mKmBp4tMaYro"
      },
      "source": [
        "###A Bigger Model###\n",
        "\n",
        "Before we used a very simple model but now it's time to try adding some complexity. Create a network that takes as input the 28x28 image, 10 outputs, and any number of layers as long as it has fewer than 2,000,000 connections. Our simple network before had 784\\*512 + 512\\*10 = 406,528 connections.\n",
        "\n",
        "Use only fully connected (`nn.Linear`) layers (we'll get to other layer types soon). However, play around with different [activation functions](https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions), [losses](https://pytorch.org/docs/stable/nn.html#loss-functions), and hyperparameter settings. You can also try different training regimes. For example, you could try lowering the learning rate during training by calling our training method twice like this:\n",
        "\n",
        "    train(net, data['train'], epochs=5, lr=.1)\n",
        "    train(net, data['train'], epochs=3, lr=.01)\n",
        "    train(net, data['train'], epochs=2, lr=.001)\n",
        "\n",
        "Maybe it works better? Why did I choose those particular numbers? Who knows! It's deep learning, no one really knows what will work you have to just try things and see.\n",
        "\n",
        "Experiment with different network architectures and settings to get the most accurate model.\n",
        "\n",
        "####**Question 4: Describe your final model architecture. How did you come up with the number of layers and number of neurons at each layer?**####\n",
        "\n",
        "TODO: Answer here\n",
        "\n",
        "####**Question 5: What hyperparameters did you experiment with? What values were good for them? Do you think your model was over or under fitting the data?**####\n",
        "\n",
        "TODO: Answer here"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "J0yynxFmd0ss"
      },
      "source": [
        "#TODO: Write your new model and experiments here\n",
        "\n",
        "class ExperiNet(nn.Module):\n",
        "    #TODO: Change all this\n",
        "    def __init__(self):\n",
        "        super(ExperiNet, self).__init__()\n",
        "        self.fc1 = nn.Linear(28*28, 512)\n",
        "        self.fc2 = nn.Linear(512, 10)\n",
        "\n",
        "    def forward(self, x):\n",
        "        x = torch.flatten(x, 1)\n",
        "        x = self.fc1(x)\n",
        "        x = F.relu(x)\n",
        "        x = self.fc2(x)\n",
        "        return x"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ba1kcHtIf-Gg"
      },
      "source": [
        "##Download and submit!##\n",
        "\n",
        "Download your iPythorn notebook from Colab to your `hw0` directory. Then follow the instructions to collate and submit your homework."
      ]
    }
  ]
}